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STATEMENT OF GOVERNMENT INTEREST 

[0001] Portions of the present invention may have been made in 
conjunction with Government funding under contract DAAD17-02-C-0115, 
and there may be certain rights to the Government. 

RELATED APPLICATIONS 

[0002] This application claims the benefit of U.S. Provisional 
Applications No. 60/585,719, filed July, 6, 2004, and 60/585,632 filed 
July 6, 2004. Each is incorporated in its entirety by reference. 

FIELD OF THE INVENTION 

[0003] The present invention relates generally to the processing of 
numerical data and more particularly to circuit designs for arithmetic 
processing. 

BACKGROUND OF THE INVENTION 

[0004] Underlying advanced digital processing systems are smaller 
basic digital circuit building blocks. The simple building blocks are 
combined and arranged in such a manner as to provide extremely fast 
and sophisticated processing. 

[0005] A carry circuit is typically used in arithmetic units, such as 
adder or subtracters, to process a carry operation in order to transfer a 
carry signal to the following carry operation. The carry circuits can be 
arranged to form other devices such as accumulators which can be 
further expanded to such devices as direct digital synthesizers (DDS). 
[0006] Existing accumulator architectures include a 4 bit adder 
accumulator using 2 bit adder blocks generally described in C. G. 
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Eckroot and S. I. Long, "A GaAs 4-bit adder-accumulator circuit for 
direct digital synthesis/ 5 IEEE Journal of Solid State Circuits, vol. 23, 
no. 2, pp. 573-580, Apr. 1988. The Eckroot and Long design details a 
circuit consisting of adder, register and lookahead-carry logic. 
[0007] The system uses 2-bit adder blocks which are cascadable to any 
2N-bit width and forms the basis for the pipelined accumulator, wherein 
this is particularly useful in applications where larger bit width permits 
greater output resolution. The pipelined structure of the adder- 
accumulator allows for the expansion to wider data words while 
preserving high clock frequency operation. 

[0008] In order to alleviate the ' slowest' part of the adder-accumulator 
designs, the cascaded architecture allows for wide bit-width 
accumulators without much of a speed penalty, since the frequency of 
operation is determined by the feedback of the sum and the setup time of 
the carry input. As the bit-width increases, the total number of 
accumulators increases linearly, while the total number of registers 
increases in a quadratic fashion: 

bits 



# accumulators = 

# registers = 



2 

bits 2 bits 
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[0009] This existing logic circuit is a traditional 4-bit adder with carry 
and propagates carry outputs. The interconnection of the 4-bit adders 
provides complete lookahead carry logic by partial coupling of the 2-bit 
registers. The power consumption of the registers becomes a dominant 
factor for accumulators with large bit-widths, thus limiting commercial 
applications that demand lower power implementations. 
[0010] In the general pipelined adder-accumulators, the circuits were 
complex because numerous latches were required for synchronization 
between stages. For adder-accumulators of 8 to 10 bits total resolution, 
a pipelined architecture using 2-bit adder blocks seemed to provide a 
reasonable compromise between circuit complexity and clock speed, 
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with the disadvantages noted herein. Among the noted aspects of the 
standard design, the gate propagation delays largely determined the 
maximum clock frequency. For example, the gate delay for the carry 
logic circuit using standard two-level series-gated ECL logic requires 
two cascaded gates. Numerous attempts have been made to increase the 
processing speed in a commercially viable manner. 

[0011] One improvement to the typical design individual 2-bit adder 
blocks which contains internal pipelining and an architecture that 
merges the logic and latching functions is described by T. Mathew, S. 
Jaganathan, D. Scott, S. Krishnan, Y. Wei, M. Urtega, M. Rodwell, and 
S. Long, "2-bit Adder Carry and Sum Logic Circuits Clocking at 19 GHz 
Clock Frequency in Transferred Substrate HBT Technology," in 
Proceedings of IEEE International Conference on Indium Phosphide and 
Related Materials, Nara, Japan, May 2001, pp. 505-508, and T. Mathew, 
S. Jaganathan, D. Scott, S. Krishnan, Y. Wei, M. Urtega, M. J. W. 
Rodwell, and S. Long, "2-bit adder: carry and sum logic circuits at 19 
GHz clock frequency in InAlAs/InGaAs HBT technology," Electronics 
Letters, vol. 37, no. 19, pp. 1156-1157, Sept. 2001. This system was 
designed to increase the clock rate of the carry and sum logic circuit of 
a 2-bit adder. 

[0012] For this 2-bit adder block, the carry blocks and sum blocks 
contain both logic functionality and latches, thus the clock inputs <J>i and 
® 2 control these internal latches. The left and right sides of the adder 
are driven by opposite clock phases, ®i and ® 2 , resulting in the 
computation and latching of a full 2-bit add operation in a single clock 
cycle. 

[0013] The modular 2-bit adder forms the basis for the pipelined 
accumulator. While a 4-bit accumulator is demonstrated, the 2-bit adder 
can be cascaded to an arbitrary 2N-bit width. This makes the adder- 
accumulator particularly useful in applications where the larger bit 
width allows for greater output resolution, such as direct digital 
synthesizer (DDS) applications. Additionally, the pipelined structure of 



3 



WO 2006/012362 



PCT/US2005/024010 



the adder-accumulator allows for the expansion to wider data words 
while preserving high clock frequency operation. 

[0014] As noted in the adder circuit, the 2-bit sum and carry operations 
are as follows wherein Ao and Bo are the 2 adder inputs; Co is the carry 
input to the full adder; S 0 is the sum logic : 

q =A 0 .B 0 +A 0 .C Q +B 0 *C 0 
C 2 = ^*5 1 +4*C + 5 1 *C 1 

S Q =A 0 ®B 0 ®C Q 

[0015] In order to reduce delays for the carry logic circuit using 
standard two-level series-gated ECL logic, which requires two cascaded 
gates, the AND-OR logic was realized as a single three-level series- 
gated ECL gate. This reduced the gate delay and somewhat improved 
overall performance. The clock frequency was further increased by 
merging the logic evaluation and latching (synchronization) resulting in 
a four-level series-gated structure. The Carry 1 and Sum 0 are computed 
on one clock phase. Carry 2 and Sum 1 are computed on the other clock 
phase. The full 2-bit adder is computed in a single clock cycle. There 
are two latches added in the design to match data phases and the latches 
are half of the master/save latch. 

[0016] While generally useful, the carry and sum circuits typically 
require four series-gated levels, while registers only require two series- 
gated levels. Unless multiple power supplies are utilized, the extra 
levels translate into unnecessary power consumption in the registers. 
The problems associated with having multiple power supplies for the 
design with carry and sum circuits both requiring four series-gated 
levels, while registers only requiring two series-gated levels was 
heretofore unresolved. 

[0017] The processing of numerical data is typically carried out in a 
digital computer and consists of numerous schemas. One example 
involves frequency synthesizers. The general requirements for 
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frequency generation are to provide precise frequency control and fast 
response, therefore the underlying circuit design must allow for high 
speed efficient processing, as even minor improvements reducing the 
processing time for a given operation can equate to significant 
improvements when dealing with large number crunching operations. 
[0018] While carry/majority circuits are generally known, there are also 
known limitations with respect to speed and power requirements. What 
is needed, therefore, are designs and systems for improved 
carry/majority circuit for applications such as high speed accumulators 
that will provide very fast processing. Such a system should also have 
low power requirements and preferably utilize fabrication techniques 
known in the industry and be readily integrated into higher assemblies. 

SUMMARY OF THE INVENTION 

[0019] One embodiment of the present invention is a carry/majority 
circuit design including a single parallel gated level scheme of the 
carry/majority circuit that has a lower propagation delay and allows for 
higher clock rates. Depending upon the number/layout of transistor 
pairs, the circuit can be a carry circuit or majority circuit. 
[0020] According to one embodiment, all of the inputs of the carry 
circuit are on the same level allowing a lower propagation delay and 
higher clock rates in high speed accumulators. 

[0021] A further embodiment of the invention is a circuit design that 
includes the n-way majority function which takes n input bits, and 
outputs '1' if at least half of the inputs are c l\ otherwise it outputs '0\ 
Carry/majority circuits are used in many different applications and 
systems such as digital logic systems, adders, accumulators and direct 
digital synthesizers (DDS). 

[0022] In one embodiment, the present design is a 4-bit adder- 
accumulator but instead of using multi-level series-gated logic for the 
carry circuit, the present design uses a single-level parallel-gated logic. 
One of the single-level parallel-gated logic designs operates at 41 GHz 
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clock frequency in InP DHBT technology. In one embodiment, 
additional diodes are added to the carry circuit to preserve logic level 
compatibility with other circuits in a chip implementation with a single 
power supply. A further design enables operation at a supply voltage at 
3.6V. 

[0023] The features and advantages described herein are not all- 
inclusive and, in particular, many additional features and advantages 
will be apparent to one of ordinary skill in the art in view of the 
drawings, specification, and claims. Moreover, it should be noted that 
the language used in the specification has been principally selected for 
readability and instructional purposes, and not to limit the scope of the 
inventive subject matter. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0024] Figure 1 shows a majority circuit with three inputs configured in 

accordance with one embodiment of the present invention. 

[0025] Figure 2 is a timing diagram of the output of the majority circuit 

for the single-level parallel-gated carry circuit with cascaded latch 

configured in accordance with one embodiment of the present invention. 

[0026] Figure 3 illustrates a prior art buffer circuit. 

[0027] Figure 4 shows a prior art carry circuit integrated with a latch. 

[0028] Figure 5 is a schematic perspective showing single-level 

parallel-gated carry circuit configured in accordance with one 

embodiment of the present invention. 

[0029] Figure 6 is a schematic perspective drawing for a majority 
circuit with five inputs configured in accordance with one embodiment 
of the present invention. 

[0030] Figure 7 is a prior art sum circuit using three input XOR 
structures merged with the latch and having four input levels for the sum 
logic section. 
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[0031] Figure 8 is a schematic drawing showing three-level series- 
gated sum circuit configured in accordance with one embodiment of the 
present invention. 

[0032] Figure 9 is a simplified block diagram of a direct digital 
synthesizer (DDS) configured in accordance with one embodiment of the 
present invention. 

DETAILED DESCRIPTION OF THE INVENTIONS 

[0033] The circuit of Figure 1 is a carry/majority circuit 5 that detects 
when two or three of the inputs are high. The circuit in this embodiment 
relies on differential emitter coupled logic (ECL) and it has three 
identical differential pairs, however the implementation in ECL is not a 
limitation as other technologies can be employed. The differential pair 
inputs in this example have inputs illustrated as Ap/An, Bp/Bn, and 
Cp/Cn. These differential pair inputs are respectively coupled to 
differential pairs Q1/Q6; Q2/Q5; and Q3/Q6. The present system steers 
current through the leg of the circuit with the higher differential, 
wherein the current through the leg is represented as II , 12 and 13 
respectively for each differential pair. For each differential pair, the 
current is steered through the transistor with the higher input voltage. 
[0034] There is a Top Rail coupled to the other end of the resistors 
R1/R2 that may be coupled to ground or a voltage supply depending 
upon the design. Likewise, the Lower Rail is coupled to the legs of the 
differential transistor pairs and can be coupled to a power supply or 
ground depending upon the particulars. For the illustrated circuit 5, if 
the Top Rail is coupled to ground, the Lower Rail is coupled to a 
negative supply. If the Top Rail is coupled to a positive supply, the 
Lower Rail is coupled to ground. 

[0035] Thus, if one of the differential inputs is a logical 'High', all of 
the current from the corresponding differential pair flows through Rl, 
and the voltage of node Xn would be reduced by an amount equal to Rl 
times the differential pair current II, 12 or 13. Since the three 
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differential transistor pairs, Q1/Q6; Q2/Q5; and Q3/Q6, are connected in 
parallel to Rl and R2, the node output Xp/Xn is the result of the sum of 
the currents times the resistors. Thus if none of the inputs are 'High', 
then no current goes thorough Rl ? and the node Xn has a voltage equal 
to the top supply rail, all of the current goes through R2, and node Xp 
has a voltage equal to the top supply rail minus three times the 
differential pair current times R2. This results in a situation where the 
voltage at node Xp is less than Xn and the output is a logical 'Low'. 
[0036] If only one of the inputs is high, then one of the differential 
currents goes thorough Rl, and the node Xn has a voltage equal to the 
top supply rail minus the differential current times Rl. Two of the three 
differential currents go through R2, and node Xp has a voltage equal to 
the top supply rail minus two times the differential pair current times 
R2. This results in a situation where the voltage at node Xp is less than 
Xn and the output is a logical 'Low'. 

[0037] If only two of the inputs are high then two of the differential 
currents go thorough Rl, and the node Xn has a voltage equal to the top 
supply rail minus two times the differential current times Rl. One of 
the three differential currents goes through R2, and node Xp has a 
voltage equal to the top supply rail minus the differential pair current 
times R2. This results in a situation where the voltage at node Xp is 
greater than Xn and the output is a logical 6 High'. 

[0038] The carry/majority circuit is shown as a differential input 
circuit, but it could also be implemented as a single-ended input circuit 
if reference voltages are used for one side of each input. In general, this 
is done by tying the "n" inputs to a voltage reference midway between 
the logic voltage swing, and using only the "p" inputs for data inputs. 
Such configurations are well known to those skilled in the art. 
[0039] If three of the inputs are high then all three of the differential 
currents go thorough Rl, and the node Xn has a voltage equal to the top 
supply rail minus three times the differential current times Rl. None of 
the three differential currents goes through R2, and node Xp has a 
voltage equal to the top supply rail. This results in a situation where the 
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voltage at node Xp is greater than Xn and the output is a logical 'High'. 
The truth table of the circuit is shown in Table A, where a "1" is a 
logical 'High', such as when Ap > An and a "0" is a logical 'Low', such 
as when Ap < An. 
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Table A 



[0040] Referring to Figure 2, the output of the three input carry circuit 
is illustrated. The output differential is displayed showing the voltage 
versus time with respect to the nodes Xn and Xp. With no inputs 
'High', there is a full differential between Xn and Xp as shown. With 
either one or two inputs 'High', there is a reduced differential as 
illustrated. When all three of the inputs are either High or Low, a full 
differential is seen across Xp and Xn, since all of the current is steered 
through one leg of the circuit. When one or two of the inputs are High, 
the differential across Xp and Xn is reduced, since 1/3 of the current is 
steered through one leg of the circuit while 2/3 of the current is steered 
through the other leg of the circuit. Although this method has a reduced 
differential for some input states, the differential across Xp and Xn is 
typically sampled by a latch which generates a full differential for 
propagation to subsequent stages. 
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[0041] The reduced differential areas can be compensated for by 
feeding the output Xp/Xn into a buffer or a latch circuit as shown in 
Figure 3. The buffer or latch circuit is well known in the art and 
recovers the output to a full differential value. As noted in the prior art 
buffer circuit of Figure 3, the circuit relies on current steering and 
essentially all of the current flows through the transistor Q7 or Q8 with 
the higher input voltage. A similar structure is used in Figure 5, but 
with Q26/Q27, R12/R13 employing a clock control to restore the output 
of the single-level parallel-gated cary/majority circuit to a full 
differential. It is possible to implement the buffer using an emitter 
coupled logic buffer, or by using a clocked latch or register. Other 
buffer implementations can also be implemented. 

[0042] A three input majority circuit is useful as a carry circuit for high 
speed accumulators that integrate carry logic 100 and latch 110. 
Referring to Figure 4, the prior art circuit illustrates that the inputs to 
transistors Q10-Q19 of the carry circuit are on three different voltage 
levels, and the overall circuit has four series-gated levels. The lower 
voltage levels switch at a slower speed than the upper levels. Such a 
design requires a higher voltage which translates into more power in 
register stages. In one example, the four-series gated levels constrain 
the supply to 5.5V. 

[0043] The accumulator with the modified carry circuit according to 
one embodiment is shown in Figure 5. The circuit logic 200 is still 
merged with the latch 210, but the carry logic 200 is reduced to one 
parallel gated level and the entire circuit has two gated levels. Figure 5 
combines the carry/majority circuit of Figure 1 with a clocked latch 
circuit. The addition of the latch circuit recovers the output of the 
carry/majority circuit to a full differential, and it provides the timing 
control required for most sequential logic circuits. 

[0044] Referring again to Figure 5, the single-level parallel gated logic 
circuit has a carry logic section 200 and a latch section 210 and the use 
of single-level parallel-gated logic is well suited for the carry terms 
since the carry operation essentially detects when two or three of the 
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inputs are high. When all three of the inputs are either High or Low, a 
full differential is seen across Xp and Xn, since all of the current is 
steered through one leg of the circuit. When one or two of the inputs are 
High, the differential across Xp and Xn is reduced, since 1/3 of the 
current is steered through one leg of the circuit while 2/3 of the current 
is steered through the other leg of the circuit. Although this method has 
a reduced differential for some input states, the differential across Xp 
and Xn is sampled by the latch 210 which generates a full differential 
for propagation to subsequent stages. 

[0045] As described herein, the present invention merges the 
combinational logic functions with the latch operation, furthermore, the 
carry terms are implemented using a single-level parallel-gated logic 
structure with a cascaded latch. This allows for a lower supply voltage 
than state of the art designs while still operating at high clock 
frequencies. 

[0046] Figure 6 shows a five input majority circuit and is an extension 
of the carry circuit of Figure 1 for more than three (3) inputs. For the 
extended majority circuits, the output is 'High 5 if more than half (or a 
majority) of the inputs Ap-Ep are 'High'. The single-level parallel- 
gated majority circuit has five differential pairs with gates Q30-Q39. 
This embodiment details how the single-level parallel-gated structure 
can be expanded to larger numbers of inputs without increasing the 
number of level or the voltage supply. If all of the tail currents are 
equal, the basic structure of the majority circuit can be expanded by 
adding additional differential pairs to implement a majority operation 
can be implemented for any odd number of inputs, such as 3, 5, 7, etc. 
[0047] As shown in Figure 6, the majority circuit uses equal tail 
currents 112-116 for all of the differential pairs. It is possible for the 
tail currents of differential pairs in the majority circuit to be modified to 
give other functionality. This could be carried out for any number of 
differential pairs, even or odd. This extension of the basic design could 
yield valuable benefits depending on the desired operation, such as 
giving certain inputs to the majority more weight than others. 
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[0048] Figure 7 shows the existing sum circuit design 300 merged with 
a latch circuit 310. This prior design employs three levels of input for 
the XOR structures plus one level of input for the latch, which 
constrains the circuit to a minimum of four levels. Since many of the 
components of a large digital system are registers that need only two 
levels, the sum circuit increases the voltage supply above what is 
required for the register circuit and is an inefficient use of power. There 
are four voltage levels in this design that result in a greater power draw. 
Existing carry and sum circuits are merged with latches as a basis for an 
accumulator architecture. While the carry and sum circuits and their 
respective latches are integrally processing the data, the major path is 
through the carry circuits in this design as long as the sum is fast 
enough. The carry '1' and sum '0 5 are computed on one clock phase 
while the carry '2' and sum 6 I s are computed on the other clock cycle. 
The full 2-bit add is computed in one clock cycle. There may be latches 
added in to match data phases. 

[0049] Referring to Figure 8, another embodiment is depicted a 
modified sum logic 400 and latch circuit 410 reducing the power 
consumption. In the known sum circuit of Figure 7 with a 4-level series 
gated design, the prior art sum circuit constrains the power supply from 
being lowered. An improvement of the present invention is an 
alternative sum circuit using fewer voltage levels. In one experiment, a 
circuit of the present design resulting in approximately 15% reduction in 
power. 

[0050] There are two separate XOR gates and the second gate is merged 
with the latch circuit. This embodiment has three series-gated levels as 
compared to the existing design that has four levels. Since the previous 
sum circuit was the only portion of the circuit constraining the design to 
a power supply supporting four series-gated levels, the present design 
allows for overall power reduction. This is achieved by the removal of 
one diode drop from the power supply and other circuitry in the chip 
design. The registers in the pipeline benefit from this change in terms 
of power consumption, particularly in designs with large bit-widths. 
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The inputs to the first stage are settled before the clock on the 2 nd stage 
is active providing lower power consumption with no degradation in 
speed. 

[0051] Direct digital synthesizers (DDS) are one of the implementations 
of frequency synthesizers and are useful as a means for generating 
frequency-agile waveforms with complex modulation. These devices 
offer certain advantages over the phase locked loop (PLL) designs used 
in a wide array of applications in fields such as communications systems 
and radars. 

[0052] Referring to Figure 9, a simplified diagram of a DDS is shown. 
In general, the waveforms are stored in memory and the system steps 
through the memory at a fixed rate. Using many small steps equates to a 
lower frequency and fewer larger steps equates to a higher frequency. 
The DDS employs an adder-accumulator which in turn may utilize the 
carry circuit with the integrated latch as well as the sum logic circuit 
with integrated latch. 

[0053] There is typically some form of a digital signal processor (DSP) 
510 that generates the input word to the phase accumulator and controls 
the frequency of the generated sine wave 560. The DSP 510 establishes 
the frequency command phase increment, and sets the size of the phase 
steps, thereby establishing the frequency. In general, any type of data 
register will work, wherein the number of bits determines the frequency 
resolution. 

[0054] The DDS typically has an adder-accumulator section 530 
wherein the accumulator 520 is clocked by an oscillator 525 and adds 
increment from the frequency command to the previously stored value at 
each pulse. As noted, the input word (frequency control word) from the 
DSP 510 to the adder-accumulator 530 controls the frequency of the 
generated sine wave. The adder 515 and accumulator 520 are configured 
in a feedback configuration and the adder 515 generally requires a very 
fast N-bit adder and was typically the bottle-neck in the processing. 
The reference oscillator is typically at least twice as fast as the 



13 



WO 2006/012362 



PCT/US2005/024010 



frequency of the sine output and in practicality establishes the maximum 
output sine frequency. 

[0055] In one embodiment, the adder-accumulator 530 employs the 
carry circuit having the single-level parallel gated design as detailed 
herein. In a further embodiment, the invention comprises the carry 
circuit having the single-level parallel gated design and the sum circuit 
employing two separate XOR gates as described herein. 
[0056] The phase accumulator 520 is coupled to the Phase to Amplitude 
Converter 535 that essentially takes the phase information and converts 
the phase information into the values of a sine wave by addressing the 
sine Read Only Memory (ROM) in the Converter 535. The ROM stores 
the values of the sine wave. As part of the design, the number of bits 
must match the number of address lines on the ROM of the Converter 
535 and it cannot use all the bits in the accumulator. The Converter 535 
output is presented to the digital to analog converter (DAC) 540 which 
develops a quantized analog sine wave. The DAC 540 determines the 
harmonic noise (uncertainty), wherein an 8 bit DAC has a -48dB signal 
to noise ratio (SNR) while a 12 bit DAC has a -72dB SNR. The size of 
the DAC is also used in selecting the size of the ROM of the Converter 
535. The DAC 540 generally is high speed, voltage output and has a 
low output impedance. 

[0057] There is often a filter section 545 that removes high frequency 
sampling components and provides a pure sine wave output to an 
amplifier unit 550 that generates the output sine wave 560. Typically, a 
low pass filter (LPF) is used such as a passive LC configuration. 
[0058] The output from the filter 545 may be amplified and results in 
the sine output 560. As known in the art, at DDS frequencies close to 
one half the clock frequency, the data becomes more difficult to filter. 
Therefore, in practice, the DDS operation is usually limited to 
approximately 40% of the clock frequency. 

[0059] High-speed accumulators are frequently used as a benchmark to 
demonstrate the intrinsic speed and the ability to yield moderately high 
device count circuits in InP double heterojunction bipolar transistor 
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(DHBT) technology. The high speed accumulator is of particular 
interest as a building block for the direct digital synthesizers (DDS) as 
is disclosed in A. Gutierrez-Aitken, J. Matsui, E. N. Kaneshiro, B. K. 
Oyama, D. Sawdai, A. K. Oki, and D. C. Streit, "Ultrahigh-speed direct 
digital synthesizer using InP DHBT technology/' IEEE J. Solid-State 
Circuits, vol. 37, no. 2, pp. 1115-1119, Sept. 2002. As noted, the 
frequency range and resolution is largely determined by the accumulator 
clock frequency and data word width. In order to achieve both a high 
clock frequency and a wide data word width, a combination of modular 
design and pipelining can be employed in an advanced III-V process. 
Certain combinational techniques are described in C. G. Eckroot and S. 
I. Long, "A GaAs 4-bit adder-accumulator circuit for direct digital 
synthesis," IEEE J. Solid-State Circuits, vol. 23, no. 2, pp. 573-580, 
Apr. 1988. 

[0060] While not limited to DDS, the high-speed accumulator circuits 
are an important component of the direct digital synthesizers. To allow 
direct generation of these waveforms at radio frequencies up to X-band, 
the accumulator circuit must operate at clock rates >30GHz, thus they 
benefit from the inherent high-speed of InP DHBT devices. The 
accumulator must also have a wide bit width in order to provide 
adequate frequency resolution, thus requiring transistor counts 
approaching 5000 devices. 

[0061] The adder-accumulator is modular and pipelined, allowing for 
expansion to wider data words, while preserving high clock frequency 
operation. The adder-accumulator also employs a single-level parallel- 
gated carry circuit. This allows for operation at high clock frequencies 
while taking a step towards reduced power consumption. In one 
embodiment, the present invention allows for inherent speed and yield 
of the InP DHBT process by demonstrating an accumulator circuit 
operating at a 41 GHz clock frequency with over 600 transistors. By 
modifying the sum circuit and reducing the power supply from the 
previous design, it was possible to simulate a reduction in the core 
power consumption of over 16% while maintaining high frequency 
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operation at 40GHz. While this embodiment was designed near peak f t 
for maximum speed performance, further reductions in power can be 
made by reducing the supply voltage to 3.6V and decreasing the current 
density at the expense of a lower clock frequency of 30GHz. 
[0062] One embodiment used InP DHBT technology with f t and f max 
both over 300GHz. This technology is generally described by G. He, J. 
Howard, M. Le ? P. Partyka, B. Li, G. Kim, R. Hess, R. Bryie, R. Lee, S. 
Rustomji, J. Pepper, M. Kail, M. Helix, R. Elder, D. Jansen, N. E. Harff, 
J. Prairie, and E. S. Daniel, "Self-aligned InP DHBT with f t and f max 
both over 300 GHz in a new manufacturable technology," IEEE Electron 
Device Letters, 2004, submitted for publication. However the present 
invention is not restricted or limited to this particular technology. 
[0063] Thus, one embodiment of the present invention realized a 4-bit 
adder-accumulator test circuit in InP DHBT technology with a maximum 
clock frequency of 41 GHz. 

[0064] One embodiment of the accumulator of the present invention is 
an InP 4-bit accumulator operating at 41 GHz clock frequency 
accumulator with a power consumption of 4.1W such as is disclosed in 
S. E. Turner, D. S. Jansen, and D. E. Kotecki, "4-bit adder-accumulator 
at 41 GHz clock frequency in InP DHBT technology," IEEE Microwave 
and Wireless Components Letters, S. Turner, R. Elder, D. Jansen, and D. 
Kotecki, "4-Bit Adder-Accumulator at 41 -GHz Clock Frequency in InP 
DHBT Technology," IEEE Microwave and Wireless Components Letters, 
Vol. 15, No. 3, pp. 144-146 , March 2005, the contents of which are 
incorporated herein by reference. This particular design used a multi- 
level circuit topology requiring a 5.5V supply voltage for some sub- 
circuits, which leads to the relatively high power dissipation. One aspect 
includes fabricating the transistors using processes such as Vitesse VIP- 
2. 

[0065] Another embodiment of the present invention provides circuit 
that allows the overall power supply voltage to be reduced by a diode 
drop, while maintaining high clock frequency operation. Simulations of 
a 4-bit accumulator with this circuit show operation at about 40GHz 
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clock frequency with a power consumption of 3.4W. In both instances, 
the circuits are designed for maximum speed and operate near peak f t . 
This application discloses the architecture of the accumulator, the 
design of the previously reported circuit, and the modifications 
contained in the new benchmarks. 

[0066] The foregoing description of the embodiments of the invention 
has been presented for the purposes of illustration and description. It is 
not intended to be exhaustive or to limit the invention to the precise 
form disclosed. Many modifications and variations are possible in light 
of this disclosure. It is intended that the scope of the invention be 
limited not by this detailed description, but rather by the claims 
appended hereto. 
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